Mapping of Sequence Reads to the Reference Genomes ◾ 81
2.4.1.4 Extracting Alignments of a Chromosome
Sometimes, we may need to work with alignments of a specific chromosome or a specific
region of the genome. With the following command, you can split the alignments of a
chromosome 11 in a separate file using “samtools view”:
samtools view SRR769545_mem_sorted.bam NC_000011.10 > chr11_human.
sam
You can use any of the reference sequence names in the RNAME field of the SAM/BAM
file, so you may need to display the content of the file to check how the reference sequences/
chromosomes are named.
2.4.1.5 Filtering and Counting Alignment in SAM/BAM Files
To filter alignments in a SAM/BAM file, we can use “samtools view” with “grep” which is a
Linux command for searching plain-text datasets for lines that match a regular expression.
For instance, to search for the alignments with chimeric reads, which are tagged as “SA:”
an optional SAM/BAM field, we can use the following:
samtools view SRR769545_mem_sorted.bam | grep ‘SA:’ | less -S
The chimeric read is the one that aligns to two distinct portions of the genome with little
or no overlap.
To count the number of chimeric reads, we can use “wc -l” command.
samtools view SRR769545_mem_sorted.bam | grep ‘SA:’ | wc -l
We can also use the option “-c” with “samtools view” to count the number of reads in a
BAM file:
samtools view -c SRR769545_mem_sorted.bam
We can use values in FLAG field of the SAM/BAM file to count the number of reads defined
by a specific FLAG value. For instance, since the unmapped reads will be flagged as “0x4”
in BAM files, we can count all mapped reads by excluding the unmapped from counting
using the “-F” option.
samtools view -c -F 0x4 SRR769545_mem_sorted.bam
To count unmapped reads, use the “-f” option instead of the “-F” option as:
samtools view -c -f 0x4 SRR769545_mem_sorted.bam
We can also use the “samtools view” command together with some Unix/Linux com-
mands and pipe symbol “|” to perform more complex count. For instance, we can count